Loading the tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. Make sure that you have the tidyverse installed by typing install.packages("tidyverse") into the Console in the bottom left corner. You only need to do this once.

tibbles

For the next little while we willwork with “tibbles” instead of R’s traditional data.frame. Tibbles are data frames, but they tweak some older behaviours to make life a little easier.

#as.tibble()

There are two main differences in the usage of a tibble vs. a classic data.frame: printing and subsetting.

  • Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. Tibbles are designed so that you don’t accidentally overwhelm your console when you print large data frames.
  • So far all the tools you’ve learned have worked with complete data frames. If you want to pull out a single variable, you need some new tools, $ and [[
df <- tibble(
  x = runif(5),
  y = rnorm(5)
)

# Extract by name
df$x
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979
df[["x"]]
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979
# Extract by position
df[[1]]
## [1] 0.4503653 0.3935854 0.7458050 0.8709692 0.3113979

Subsetting

Dplyr

Choosing Columns with select()

Renaming variables with rename()

Sorting and Reordering with arrange()

Subsetting and Filtering Data with filter()

Adding new columns using dplyr’s mutate():

Piping

The pipe, %>%, comes from the magrittr package by Stefan Milton Bache. Packages in the tidyverse load %>% for you automatically, so you don’t usually load magrittr explicitly.

The point of the pipe is to help you write code in a way that is easier to read and understand.

tidyr

The goal of tidyr is to help you create tidy data. Tidy data is data where:

  • Every column is variable.
  • Every row is an observation..
  • Every cell is a single value.

Reshaping Data (Wide/Long)

Wide Data Long Data

There are two sets of methods that are explained below:

  • gather() and spread() from the tidyr package. This is a newer interface to the reshape2 package.
  • melt() and dcast() from the reshape2 package.